Recall Gaussian sequence model. The goal is to estimate via with low MSE:
The model is more general than it appears. Like if for known , we could make a sufficiency reduction to obtain
1.1 Bayes Estimators
If we introduce Bayesian prior , then Bayes estimator is .
We can think of this as a tuning parameter for a generic linear shrinkage estimator where is in effect a tuning parameter we will call shrinkage parameter. Taking corresponds to the Bayes estimator.
If we aren't sure which to use, (like have a priori uncertainty about ), we can use hierarchical Bayes: so we are in effect estimating from the whole data set and the plugging it in as a data-adaptive tuning parameter.
If then the UMVUE for is based on the fact that
Plugging in results in an estimator called James-Stein estimator:
1.2 James-Stein Paradox
While James-Stein estimator can be motivated as an empirical Bayes estimator, it is good even without making any Bayesian assumptions at all.
For , the estimator is actually inadmissible as an estimator of :
This is surprising because it not only beats the UMVUE on average, but every fixed value of .
In fact, if we use an estimator shrinking towards : . This also dominates because it is just the James-Stein estimator we'd get if we substitute
By the translation invariance of Gaussian location model, James-Stein estimator for using also dominates , i.e. for .
1.3 Linear Shrinkage Estimators
Even without introducing a Bayesian prior for , we can motivate our linear shrinkage estimator purely from the perspective of trading bias for a reduction in variance. Calculate MSE:
So let
This looks similar to (Bayes-optimal under the Gaussian prior)
2 SURE
Theorem (Stein's Lemma)
Suppose and is differentiable, with . Then
Proof
First consider . Then (by the fact that )
For general , let , where , then applying the above result for :
Now consider the multivariate version. For a function , define Jacobian matrix: and the Frobenius norm as
Apply Stein's lemma to , then
(Note that we assume now.)
So if is known, we obtain the unbiased estimator
We call it Stein Unbiased Risk Estimator (SURE).
Example (Shrinking toward )
We define an estimator that shrinks partway toward the average estimate across the coordinates: (Bet most of the are close to ) Then ( is a vector with all s) and
So an unbiased estimator for the MSE of is where is the sample variance.
We can use this estimator in two different ways.
To calculate the actual MSE by taking the estimator's expectation, which is the actual MSE of . The only RV is . Write , , then where . Plugging in this expectation, so .
When , all values are equal, and we set (shrink fully to the sample mean); if , we should take (shrink very little).
To choose adaptively to minimize this estimator: take which we could think of as an estimator of since is unbiased for . If we plug in we get a new adaptive shrinkage estimator which is not the same as for any fixed , and we could use the same idea to calculate its MSE if we wanted to.
3 Risk of the James-Stein Estimator
Now we calculate the risk of James-Stein estimator .
First assume . Like above, denote and by we have then
Plug in (2.3),
Taking expectations:
(Fact mentioned here) The total MSE does not rise with !
On the other hand, suppose , then , so will be driven to .
For general , similarly, James-Stein estimator is , then . Then .